Language identification of six languages based on a common set of broad phonemes
نویسندگان
چکیده
ON A COMMON SET OF BROAD PHONEMES Kay M. Berkling ([email protected]), Etienne Barnard ([email protected]) Center for Spoken Language Understanding, Oregon Graduate Institute of Science and Technology, 20000 N.W. Walker Road, P.O. Box 91000, Portland, OR 97291-1000, USA ABSTRACT We describe a system designed to recognize the language of an utterance spoken by any native speaker over the telephone. Our previous work based on language-speci c phonemes [5] is extended to include sequences of all lengths of language-independent speech units. These units are derived by clustering phonemes across all languages in the system (Hindi, Spanish, English, German, Japanese, and Mandarin). Our language-identi cation results based on broad-phoneme occurrence statistics indicate 90% accurate distinction between English and Japanese, which is comparable to results obtained when using language-speci c phonemes. By relaxing the precision of language-dependent phonemes into language-independent broad phonemes we thus retain language discriminative power. The degree to which the precision can be relaxed while retaining sequences of broad phonemes that can discriminate between languages is an indication of the accuracy with which the phoneme segmenter and recognizer have to recognize the incoming speech.
منابع مشابه
مقایسه روش های طیفی برای شناسایی زبان گفتاری
Identifying spoken language automatically is to identify a language from the speech signal. Language identification systems can be divided into two categories, spectral-based methods and phonetic-based methods. In the former, short-time characteristics of speech spectrum are extracted as a multi-dimensional vector. The statistical model of these features is then obtained for each language. The ...
متن کاملSpeaker Clustering for Multilingual Synthesis
Today, speech synthesizers in new languages are typically built by collecting several hours of well recorded speech in the target language. The time and effort involved in collection and correction can be prohibitive when lack of resources is common in addressing under-represented languages. An alternative method is to use acoustic data from an existing synthesizer in a different language and t...
متن کاملCross-language merged speech units and their descriptive phonetic correlates
The focus of this paper is to formulate an approach to merging phonemes across languages and to evaluate the resulting crosslanguage merged speech units on the basis of the traditional acousticphonetic descriptions of the phonemes. The methodology is based on the belief that some phonemes across a set of languages may be similar enough to be equated, contrasting traditional phonology which trea...
متن کاملLanguage identification with limited resources
Language identification is an important issue in many speech applications. We address this problem from the point of view of classification of sequences of phonemes, given the assumption that each language has its own phonotactic characteristics. In order to achieve this classification, we have to decode the speech utterances in terms of phonemes. The set of phonemes must be the same for all th...
متن کاملLanguage-identification Based on Cross-language Speech Units and Language-dependent Phonemes
This paper reports on results from ongoing research on language-identification (LID) performed on the three languages: American-English, German and Spanish. The speech material is from the Oregon Graduate Institute Spontaneous Telephone Speech Corpus, OGI_TS. This leads to a baseline LID-system which consists of three parallel phoneme recognisers each of which are followed by three language mod...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1994